GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data

نویسندگان

  • Dana R. Thomson
  • Forrest R. Stevens
  • Nick W. Ruktanonchai
  • Andrew J. Tatem
  • Marcia C. Castro
چکیده

BACKGROUND Household survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how gridded population data might instead be used as a sample frame, and introduces the R GridSample algorithm for selecting primary sampling units (PSU) for complex household surveys with gridded population data. With a gridded population dataset and geographic boundary of the study area, GridSample allows a two-step process to sample "seed" cells with probability proportionate to estimated population size, then "grows" PSUs until a minimum population is achieved in each PSU. The algorithm permits stratification and oversampling of urban or rural areas. The approximately uniform size and shape of grid cells allows for spatial oversampling, not possible in typical surveys, possibly improving small area estimates with survey results. RESULTS We replicated the 2010 Rwanda Demographic and Health Survey (DHS) in GridSample by sampling the WorldPop 2010 UN-adjusted 100 m × 100 m gridded population dataset, stratifying by Rwanda's 30 districts, and oversampling in urban areas. The 2010 Rwanda DHS had 79 urban PSUs, 413 rural PSUs, with an average PSU population of 610 people. An equivalent sample in GridSample had 75 urban PSUs, 405 rural PSUs, and a median PSU population of 612 people. The number of PSUs differed because DHS added urban PSUs from specific districts while GridSample reallocated rural-to-urban PSUs across all districts. CONCLUSIONS Gridded population sampling is a promising alternative to typical census-based sampling when census data are moderately outdated or inaccurate. Four approaches to implementation have been tried: (1) using gridded PSU boundaries produced by GridSample, (2) manually segmenting gridded PSU using satellite imagery, (3) non-probability sampling (e.g. random-walk, "spin-the-pen"), and random sampling of households. Gridded population sampling is in its infancy, and further research is needed to assess the accuracy and feasibility of gridded population sampling. The GridSample R algorithm can be used to forward this research agenda.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accounting for Multi - stage Sample Designs in Complex Sample Variance

Nationally representative samples of large populations often have complex design features for a variety of reasons (e.g., cost efficiency). For purposes of estimating sampling variances based on complex multi-stage sample designs involving stratification and cluster sampling, the sampling error codes provided by survey organizations in public use survey data files often assume “ultimate cluster...

متن کامل

Household food diversity and nutritional status among adults in Brazil

BACKGROUND The aims of this study were to evaluate whether a diversity of healthy foods in a household would decrease the availability of unhealthy foods and to evaluate the association between a healthy dietary diversity score (DDS) and nutritional status among adults. METHODS Data from the 2002-2003 Brazilian Household Budget Survey were used. This nationwide survey used a two-stage samplin...

متن کامل

Using Contact History Information to Adjust for Nonresponse in the Current Population Survey

The Current Population Survey (CPS) adjusts the sampling weights for nonresponse to match population controls based on cells which combine similar primary sampling units (PSU) based on size and urbanicity. The adjustment method assumes that the nonresponse is random within the adjustment cells. This adjustment increases weights for responding units in PSUs with higher nonresponse. The present s...

متن کامل

Household Coverage of Fortified Staple Food Commodities in Rajasthan, India

A spatially representative statewide survey was conducted in Rajasthan, India to assess household coverage of atta wheat flour, edible oil, and salt. An even distribution of primary sampling units were selected based on their proximity to centroids on a hexagonal grid laid over the survey area. A sample of n = 18 households from each of m = 252 primary sampling units PSUs was taken. Demographic...

متن کامل

The Challenge of Redesigning the Consumer Price Index Area Sample

The U. S. Consumer Price Index (CPI) uses a multistage sample design which is revised approximately every ten years. The first stage consists of selecting primary sampling units (PSUs). The selected PSUs are also used in the Consumer Expenditure Survey (CE) and the Consumer Point of Purchase Survey (CPOPS). This paper describes the recently completed PSU selection process for the 1998 CPI Revis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2017